~ When to use?
We might want to allow for nonlinear predictors while also including random effects to account for spatiotemporal structure etc.
When estimating GAMMs as a mixed model, we need to compute confidence /credible intervals as for GAM. Beta now contains both i) all the fixed effects and the random effects for the smooth terms (only).
Sometimes a random effects model is required with a large number of smooth curves, which really are the random effects. That is, we might have a random smooth curve per subject, these can be set up to be efficiently computed in gamm anf gamm4
After all, the most important should be knowing what do you want to accomplish by using random effects. Then use find the most appropriate model structure.
There are two main libraries that will do GAMs and GAMMs.
mgcv includes gamm function which fits GAMMs based on linear models as implemented in nlme library.gamm4 includes gamm4 function which fits GAMMs based on linear models implemented in lme4.~ Pros / Cons:
mgcv: This gives full access to random effects and correlation structure, just as it is available in lme. Work hard with optimization because it is hard separate out a trend from correlation. Hard work for model optimizer. It’s easy to make model that runs into numerical problems of estimation. Function performs re-parameterization directly or as a part of PQL iteration.
gamm4: Does not give correlation structure, does not rely on PQL for GLMM estimation.
gamm4 or mgcv?What functions?
mgcv::gam, mgcv::bam, mgcv::gamm, or gamm4::gamm Pedersen EJ, Miller DL, Simpson GL, Ross N. 2019. Hierarchical generalized additive models in ecology: an introduction with mgcv. PeerJ 7:e6876 https://doi.org/10.7717/peerj.6876
(Directly from Pedersen et al., 2019)
Know your question: - It is assumed that the random effects and correlation structures are employed primarily to model residual correlation in the data and that the prime interest is in inference about the terms in the fixed effects model formula including the smooths
Should each group have its own smoother, or will a common smoother suffice?
Do all of the group-specific smoothers have the same wiggliness, or should each group have its own smoothing parameter?
Will the smoothers for each group have a similar shape to one another—a shared global smoother?
Smooths specification: In mgcv::gam (and mgcv::gamm) it is an argument in each s() predictor variable, but the wiggly components of the smooth are treated as random effects
s(..., bs="re")When to use mixed model? what determined random effect structure? (Directly from Pedersen et al., 2019)
- A single common smoother for all observations; We will refer to this as model G, as it only has a Global smoother.
gam(log(uptake) ~ s(log(conc), k=5, bs="tp") + s(Plant_uo, k=12, bs="re")
- A global smoother plus group-level smoothers that have the same wiggliness. We will refer to this as model GS (for Global smoother with individual effects that have a Shared penalty).
gam(log(uptake) ~ s(log(conc), k=5, m=2) + s(log(conc), Plant_uo, k=5, bs="fs", m=2)
- A global smoother plus group-level smoothers with differing wiggliness. We will refer to this as model GI (for Global smoother with individual effects that have Individual penalties).
gam(log(uptake) ~ s(log(conc), k=5, m=2, bs="tp") + s(log(conc), by= Plant_uo, k=5, m=1, bs="tp") + s(Plant_uo, bs="re", k=12)
- Group-specific smoothers without a global smoother, but with all smoothers having the same wiggliness. We will refer to this as model S.
gam(log(uptake) ~ s(log(conc), Plant_uo, k=5, bs="fs", m=2)
- Group-specific smoothers with different wiggliness. We will refer to this as model I.
gam(log(uptake) ~ s(log(conc), by=Plant_uo, k=5, bs="tp", m=2) + s(Plant_uo, bs="re", k=12)
Benefits using mixed GAMM in place of GAM >> hierarchical modelling! Mixed modelling functions are typically set to compute and converge efficiently using the sparsity structures of the random effect in ways that not-mixed models are not (e.g. gam)
! Code available from Git repo, open source in association with Pedersen et al., 2019: https://peerj.com/articles/6876/ (We only modified some plots, all the good info and more code on their Git)
library(mgcv)
library(gamm4)
library(gamair)
library(MASS)
library(lme4)
library(lattice)
library(knitr)
library(datasets)
library(tidyverse)
CO2_modG <- gam(log(uptake) ~ s(log(conc), k=5, bs="tp") +
s(Plant_uo, k=12, bs="re"),
data=CO2, method="REML", family="gaussian")
# setup prediction data
CO2_modG_pred <- with(CO2,
expand.grid(conc=seq(min(conc), max(conc), length=100),
Plant_uo=levels(Plant_uo)))
# make the prediction, add this and a column of standard errors to the prediction
# data.frame. Predictions are on the log scale.
CO2_modG_pred <- cbind(CO2_modG_pred,
predict(CO2_modG,
CO2_modG_pred,
se.fit=TRUE,
type="response"))
# make the plot. Note here the use of the exp() function to back-transform the
# predictions (which are for log-uptake) to the original scale
ggplot(data=CO2, aes(x=conc, y=uptake, group=Plant_uo, color=Plant_uo)) +
# facet_wrap(~Plant_uo) +
geom_ribbon(aes(ymin=exp(fit - 2*se.fit), ymax=exp(fit + 2*se.fit), x=conc, fill=Plant_uo, group=Plant_uo),
data=CO2_modG_pred,
alpha=0.1,
inherit.aes=FALSE) +
geom_line(aes(y=exp(fit)), data=CO2_modG_pred) +
geom_point() +
labs(x=expression(CO[2] ~ concentration ~ (mL ~ L^{-1})),
y=expression(CO[2] ~ uptake ~ (mu*mol ~ m^{-2})))+
theme_classic()
CO2_modGS <- gam(log(uptake) ~ s(log(conc), k=5, m=2) +
s(log(conc), Plant_uo, k=5, bs="fs", m=2),
data=CO2, method="REML")
## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has repeated 1-
## d smooths of same variable.
# setup prediction data
CO2_modGS_pred <- with(CO2,
expand.grid(conc=seq(min(conc), max(conc), length=100),
Plant_uo=levels(Plant_uo)))
CO2_modGS_pred <- cbind(CO2_modGS_pred,
predict(CO2_modGS,
CO2_modGS_pred,
se.fit=TRUE,
type="response"))
ggplot(data=CO2, aes(x=conc, y=uptake, group=Plant_uo, color=Plant_uo)) +
# facet_wrap(~Plant_uo) +
geom_ribbon(aes(ymin=exp(fit - 2*se.fit), ymax=exp(fit + 2*se.fit), x=conc, fill=Plant_uo, group=Plant_uo),
data=CO2_modGS_pred,
alpha=0.1,
inherit.aes=FALSE) +
geom_line(aes(y=exp(fit)), data=CO2_modGS_pred) +
geom_point() +
labs(x=expression(CO[2] ~ concentration ~ (mL ~ L^{-1})),
y=expression(CO[2] ~ uptake ~ (mu*mol ~ m^{-2})))+
theme_classic()
CO2_modGI <- gam(log(uptake) ~ s(log(conc), k=5, m=2, bs="tp") +
s(log(conc), by= Plant_uo, k=5, m=1, bs="tp") +
s(Plant_uo, bs="re", k=12),
data=CO2, method="REML")
# make the prediction, add this and a column of standard errors to the prediction
# data.frame. Predictions are on the log scale.
# setup prediction data
CO2_modGI_pred <- with(CO2,
expand.grid(conc=seq(min(conc), max(conc), length=100),
Plant_uo=levels(Plant_uo)))
CO2_modGI_pred <- cbind(CO2_modGI_pred,
predict(CO2_modGI,
CO2_modGI_pred,
se.fit=TRUE,
type="response"))
ggplot(data=CO2, aes(x=conc, y=uptake, group=Plant_uo, color=Plant_uo)) +
# facet_wrap(~Plant_uo) +
geom_ribbon(aes(ymin=exp(fit - 2*se.fit), ymax=exp(fit + 2*se.fit), x=conc, fill=Plant_uo, group=Plant_uo),
data=CO2_modGI_pred,
alpha=0.1,
inherit.aes=FALSE) +
geom_line(aes(y=exp(fit)), data=CO2_modGI_pred) +
geom_point() +
labs(x=expression(CO[2] ~ concentration ~ (mL ~ L^{-1})),
y=expression(CO[2] ~ uptake ~ (mu*mol ~ m^{-2})))+
theme_classic()
CO2_modS <- gam(log(uptake) ~ s(log(conc), Plant_uo, k=5, bs="fs", m=2),
data=CO2, method="REML")
# setup prediction data
CO2_modS_pred <- with(CO2,
expand.grid(conc=seq(min(conc), max(conc), length=100),
Plant_uo=levels(Plant_uo)))
CO2_modS_pred <- cbind(CO2_modS_pred,
predict(CO2_modS,
CO2_modS_pred,
se.fit=TRUE,
type="response"))
ggplot(data=CO2, aes(x=conc, y=uptake, group=Plant_uo, color=Plant_uo)) +
# facet_wrap(~Plant_uo) +
geom_ribbon(aes(ymin=exp(fit - 2*se.fit), ymax=exp(fit + 2*se.fit), x=conc, fill=Plant_uo, group=Plant_uo),
data=CO2_modS_pred,
alpha=0.1,
inherit.aes=FALSE) +
geom_line(aes(y=exp(fit)), data=CO2_modS_pred) +
geom_point() +
labs(x=expression(CO[2] ~ concentration ~ (mL ~ L^{-1})),
y=expression(CO[2] ~ uptake ~ (mu*mol ~ m^{-2})))+
theme_classic()
CO2_modI <- gam(log(uptake) ~ s(log(conc), by=Plant_uo, k=5, bs="tp", m=2) +
s(Plant_uo, bs="re", k=12),
data=CO2, method="REML")
# setup prediction data
CO2_modI_pred <- with(CO2,
expand.grid(conc=seq(min(conc), max(conc), length=100),
Plant_uo=levels(Plant_uo)))
CO2_modI_pred <- cbind(CO2_modI_pred,
predict(CO2_modI,
CO2_modI_pred,
se.fit=TRUE,
type="response"))
ggplot(data=CO2, aes(x=conc, y=uptake, group=Plant_uo, color=Plant_uo)) +
# facet_wrap(~Plant_uo) +
geom_ribbon(aes(ymin=exp(fit - 2*se.fit), ymax=exp(fit + 2*se.fit), x=conc, fill=Plant_uo, group=Plant_uo),
data=CO2_modI_pred,
alpha=0.1,
inherit.aes=FALSE) +
geom_line(aes(y=exp(fit)), data=CO2_modI_pred) +
geom_point() +
labs(x=expression(CO[2] ~ concentration ~ (mL ~ L^{-1})),
y=expression(CO[2] ~ uptake ~ (mu*mol ~ m^{-2})))+
theme_classic()
How differently these models perform?
| Model | df | AIC | data_source | deltaAIC |
|---|---|---|---|---|
| CO2_modG | 16.61168 | -118.6467 | CO2 | 101.4650385 |
| CO2_modGS | 39.43030 | -198.5511 | CO2 | 21.5606113 |
| CO2_modGI | 42.14821 | -216.2503 | CO2 | 3.8613974 |
| CO2_modS | 52.96891 | -219.2937 | CO2 | 0.8180002 |
| CO2_modI | 55.68798 | -220.1117 | CO2 | 0.0000000 |
summary(CO2_modI)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log(uptake) ~ s(log(conc), by = Plant_uo, k = 5, bs = "tp", m = 2) +
## s(Plant_uo, bs = "re", k = 12)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.20914 0.09497 33.79 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(log(conc)):Plant_uoQn1 3.406 3.796 53.89 < 2e-16 ***
## s(log(conc)):Plant_uoQn2 3.464 3.831 95.58 < 2e-16 ***
## s(log(conc)):Plant_uoQn3 3.430 3.810 72.25 < 2e-16 ***
## s(log(conc)):Plant_uoQc1 3.228 3.672 65.78 < 2e-16 ***
## s(log(conc)):Plant_uoQc3 3.336 3.750 77.51 < 2e-16 ***
## s(log(conc)):Plant_uoQc2 3.720 3.951 147.01 < 2e-16 ***
## s(log(conc)):Plant_uoMn3 3.252 3.690 65.03 < 2e-16 ***
## s(log(conc)):Plant_uoMn2 3.402 3.793 72.50 < 2e-16 ***
## s(log(conc)):Plant_uoMn1 3.238 3.679 99.74 < 2e-16 ***
## s(log(conc)):Plant_uoMc2 2.908 3.402 25.07 1.08e-10 ***
## s(log(conc)):Plant_uoMc3 3.439 3.816 22.79 7.59e-11 ***
## s(log(conc)):Plant_uoMc1 2.587 3.089 44.57 < 2e-16 ***
## s(Plant_uo) 10.958 11.000 259.09 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.986 Deviance explained = 99.5%
## -REML = -25.211 Scale est. = 0.0029127 n = 84
gam.check(CO2_modGI)
##
## Method: REML Optimizer: outer newton
## full convergence after 17 iterations.
## Gradient range [-9.765588e-06,4.025318e-05]
## (score -60.45161 & scale 0.002867246).
## Hessian positive definite, eigenvalue range [6.086816e-06,42.21203].
## Model rank = 65 / 65
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(log(conc)) 4.00e+00 3.85e+00 1.01 0.43
## s(log(conc)):Plant_uoQn1 4.00e+00 1.19e+00 1.01 0.50
## s(log(conc)):Plant_uoQn2 4.00e+00 2.01e+00 1.01 0.47
## s(log(conc)):Plant_uoQn3 4.00e+00 1.84e-05 1.01 0.48
## s(log(conc)):Plant_uoQc1 4.00e+00 4.40e-05 1.01 0.53
## s(log(conc)):Plant_uoQc3 4.00e+00 1.87e+00 1.01 0.57
## s(log(conc)):Plant_uoQc2 4.00e+00 3.61e+00 1.01 0.46
## s(log(conc)):Plant_uoMn3 4.00e+00 2.98e-05 1.01 0.48
## s(log(conc)):Plant_uoMn2 4.00e+00 4.86e-05 1.01 0.55
## s(log(conc)):Plant_uoMn1 4.00e+00 2.11e+00 1.01 0.54
## s(log(conc)):Plant_uoMc2 4.00e+00 3.21e+00 1.01 0.47
## s(log(conc)):Plant_uoMc3 4.00e+00 3.30e+00 1.01 0.50
## s(log(conc)):Plant_uoMc1 4.00e+00 2.98e+00 1.01 0.47
## s(Plant_uo) 1.20e+01 1.10e+01 NA NA
CO2_modI$edf
## (Intercept) s(log(conc)):Plant_uoQn1.1
## 1.00000000 0.85518073
## s(log(conc)):Plant_uoQn1.2 s(log(conc)):Plant_uoQn1.3
## 0.44830561 1.10265277
## s(log(conc)):Plant_uoQn1.4 s(log(conc)):Plant_uoQn2.1
## 1.00000000 0.87321387
## s(log(conc)):Plant_uoQn2.2 s(log(conc)):Plant_uoQn2.3
## 0.49591830 1.09519862
## s(log(conc)):Plant_uoQn2.4 s(log(conc)):Plant_uoQn3.1
## 1.00000000 0.86255322
## s(log(conc)):Plant_uoQn3.2 s(log(conc)):Plant_uoQn3.3
## 0.46727600 1.09973679
## s(log(conc)):Plant_uoQn3.4 s(log(conc)):Plant_uoQc1.1
## 1.00000000 0.79413832
## s(log(conc)):Plant_uoQc1.2 s(log(conc)):Plant_uoQc1.3
## 0.31342445 1.12069848
## s(log(conc)):Plant_uoQc1.4 s(log(conc)):Plant_uoQc3.1
## 1.00000000 0.83231812
## s(log(conc)):Plant_uoQc3.2 s(log(conc)):Plant_uoQc3.3
## 0.39344967 1.11062211
## s(log(conc)):Plant_uoQc3.4 s(log(conc)):Plant_uoQc2.1
## 1.00000000 0.94189016
## s(log(conc)):Plant_uoQc2.2 s(log(conc)):Plant_uoQc2.3
## 0.72313866 1.05498032
## s(log(conc)):Plant_uoQc2.4 s(log(conc)):Plant_uoMn3.1
## 1.00000000 0.80277838
## s(log(conc)):Plant_uoMn3.2 s(log(conc)):Plant_uoMn3.3
## 0.33039897 1.11874427
## s(log(conc)):Plant_uoMn3.4 s(log(conc)):Plant_uoMn2.1
## 1.00000000 0.85379606
## s(log(conc)):Plant_uoMn2.2 s(log(conc)):Plant_uoMn2.3
## 0.44481572 1.10318088
## s(log(conc)):Plant_uoMn2.4 s(log(conc)):Plant_uoMn1.1
## 1.00000000 0.79764271
## s(log(conc)):Plant_uoMn1.2 s(log(conc)):Plant_uoMn1.3
## 0.32023473 1.11992773
## s(log(conc)):Plant_uoMn1.4 s(log(conc)):Plant_uoMc2.1
## 1.00000000 0.66193140
## s(log(conc)):Plant_uoMc2.2 s(log(conc)):Plant_uoMc2.3
## 0.11466943 1.13096623
## s(log(conc)):Plant_uoMc2.4 s(log(conc)):Plant_uoMc3.1
## 1.00000000 0.86546722
## s(log(conc)):Plant_uoMc3.2 s(log(conc)):Plant_uoMc3.3
## 0.47495956 1.09853483
## s(log(conc)):Plant_uoMc3.4 s(log(conc)):Plant_uoMc1.1
## 1.00000000 0.50799585
## s(log(conc)):Plant_uoMc1.2 s(log(conc)):Plant_uoMc1.3
## -0.02407639 1.10277256
## s(log(conc)):Plant_uoMc1.4 s(Plant_uo).1
## 1.00000000 0.91314228
## s(Plant_uo).2 s(Plant_uo).3
## 0.91314228 0.91314228
## s(Plant_uo).4 s(Plant_uo).5
## 0.91314228 0.91314228
## s(Plant_uo).6 s(Plant_uo).7
## 0.91314228 0.91314228
## s(Plant_uo).8 s(Plant_uo).9
## 0.91314228 0.91314228
## s(Plant_uo).10 s(Plant_uo).11
## 0.91314228 0.91314228
## s(Plant_uo).12
## 0.91314228
qq.gam(CO2_modGI)
gam$lme vs gam$gam
About the data: Daily temperature data fro Cairo, specifically the average air temperature (F) in Cairo from Jan 1st 1995. The dataset can be found in package “gamair”. (https://cran.r-project.org/web/packages/gamair/gamair.pdf)
kable(cairo[1:5,], caption = "Cairo data")
| month | day.of.month | year | temp | day.of.year | time |
|---|---|---|---|---|---|
| 1 | 1 | 1995 | 59.2 | 1 | 1 |
| 1 | 2 | 1995 | 57.5 | 2 | 2 |
| 1 | 3 | 1995 | 57.4 | 3 | 3 |
| 1 | 4 | 1995 | 59.3 | 4 | 4 |
| 1 | 5 | 1995 | 58.8 | 5 | 5 |
str(cairo)
## 'data.frame': 3780 obs. of 6 variables:
## $ month : int 1 1 1 1 1 1 1 1 1 1 ...
## $ day.of.month: int 1 2 3 4 5 6 7 8 9 10 ...
## $ year : Factor w/ 11 levels "1995","1996",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ temp : num 59.2 57.5 57.4 59.3 58.8 55.7 57.3 56.5 57.5 58.5 ...
## $ day.of.year : num 1 2 3 4 5 6 7 8 9 10 ...
## $ time : int 1 2 3 4 5 6 7 8 9 10 ...
Random effects: Fixed effects:
## 7.7.2 Cairo temperature
ctamm <- gamm(temp ~ s(day.of.year, bs="cc",k=20) + s(time, bs="cr"), data=cairo, correlation = corAR1(form=~1|year))
ctamm.gam <- gam(temp ~ s(day.of.year, bs="cc",k=20) + s(time, bs="cr") , data=cairo)
ctamm25 <- gamm(temp ~ s(day.of.year) + s(time), data=cairo, correlation = corAR1(form=~1|year))
plot(ctamm$gam,scale=0,pages=1)
plot(ctamm.gam,scale=0,pages=1)
plot(ctamm25$gam,pages=1)
data(cairo)
ctamm.gamm4 <- gamm4(temp ~ s(day.of.year) + s(time) , data=cairo, random=~(1|year))
plot(ctamm.gamm4$gam,pages=1)
________________________ ### 2. Sitka AKA Random wiggly data
About the data: Sitka tree growth data under enhanced ozone and control conditions, ‘random wiggly data’?! sounds like ecology
xyplot(log.size~days|as.factor(ozone),data=sitka,type="l",groups=id.num)
sitka$id.num <- as.factor(sitka$id.num) # random factor needs to be a factor
b <- gamm(log.size~s(days) + ozone + ozone:days + s(days,id.num,bs="fs",k=5),data=sitka)
## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has repeated 1-
## d smooths of same variable.
plot(b$gam,pages=1)
b.gam <- gam(log.size~s(days) + ozone + ozone:days + s(days,id.num,bs="fs",k=5), data=sitka)
## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has repeated 1-
## d smooths of same variable.
plot(b.gam,pages=1)